This notebook creates a graph representation of the collaboration between contributors of a Git repository, where nodes are authors, and edges are weighted by the parent/child dependencies between the commits of authors.
In [13]:
%matplotlib inline
from bigbang.git_repo import GitRepo;
from bigbang import repo_loader;
import matplotlib.pyplot as plt
import networkx as nx
import pandas as pd
In [14]:
repos = repo_loader.get_org_repos("codeforamerica")
repo = repo_loader.get_multi_repo(repos=repos)
full_info = repo.commit_data;
Nodes will be Author objects, each of which holds a list of Commit objects.
In [15]:
class Commit:
def __init__(self, message, hexsha, parents):
self.message = message
self.hexsha = hexsha
self.parents = parents
def __repr__(self):
return ' '.join(self.message.split(' ')[:4])
class Author:
def __init__(self, name, commits):
self.name = name
self.commits = commits
self.number_of_commits = 1
def add_commit(self, commit):
self.commits.append(commit)
self.number_of_commits += 1
def __repr__(self):
return self.name
We create a list of authors, also separately keeping track of committer names to make sure we only add each author once. If a commit by an already stored author is found, we add it to that authors list of commits.
In [16]:
def get_authors():
authors = []
names = []
for index, row in full_info.iterrows():
name = row["Committer Name"]
hexsha = row["HEXSHA"]
parents = row["Parent Commit"]
message = row["Commit Message"]
if name not in names:
authors.append(Author(name, [Commit(message, hexsha, parents)]))
names.append(name)
else:
for author in authors:
if author.name == name:
author.add_commit(Commit(message, hexsha, parents))
return authors
We create our graph by forming an edge whenever an author has a commit which is the parent of another author's commit, and only increasing the weight of that edge if an edge between those two authors already exists.
In [17]:
def make_graph(nodes):
G = nx.Graph()
for author in nodes:
for commit in author.commits:
for other in nodes:
for other_commit in other.commits:
if commit.hexsha in other_commit.parents:
if G.has_edge(author, other):
G[author][other]['weight'] += 1
else:
G.add_edge(author, other, weight = 1)
return G
In [ ]:
nodes = get_authors()
G = make_graph(nodes)
pos = nx.spring_layout(G, iterations=100)
nx.draw(G, pos, font_size=8, with_labels = False)
# nx.draw_networkx_labels(G, pos);